Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Schlick, Tamar (Ed.)Dictionary learning (DL), implemented via matrix factorization (MF), is commonly used in computational biology to tackle ubiquitous clustering problems. The method is favored due to its conceptual simplicity and relatively low computational complexity. However, DL algorithms produce results that lack interpretability in terms of real biological data. Additionally, they are not optimized for graph-structured data and hence often fail to handle them in a scalable manner. In order to address these limitations, we propose a novel DL algorithm calledonline convex network dictionary learning(online cvxNDL). Unlike classical DL algorithms, online cvxNDL is implemented via MF and designed to handle extremely large datasets by virtue of its online nature. Importantly, it enables the interpretation of dictionary elements, which serve as cluster representatives, through convex combinations of real measurements. Moreover, the algorithm can be applied to data with a network structure by incorporating specialized subnetwork sampling techniques. To demonstrate the utility of our approach, we apply cvxNDL on 3D-genome RNAPII ChIA-Drop data with the goal of identifying important long-range interaction patterns (long-range dictionary elements). ChIA-Drop probes higher-order interactions, and produces data in the form of hypergraphs whose nodes represent genomic fragments. The hyperedges represent observed physical contacts. Our hypergraph model analysis has the objective of creating an interpretable dictionary of long-range interaction patterns that accurately represent global chromatin physical contact maps. Through the use of dictionary information, one can also associate the contact maps with RNA transcripts and infer cellular functions. To accomplish the task at hand, we focus on RNAPII-enriched ChIA-Drop data fromDrosophila MelanogasterS2 cell lines. Our results offer two key insights. First, we demonstrate that online cvxNDL retains the accuracy of classical DL (MF) methods while simultaneously ensuring unique interpretability and scalability. Second, we identify distinct collections of proximal and distal interaction patterns involving chromatin elements shared by related processes across different chromosomes, as well as patterns unique to specific chromosomes. To associate the dictionary elements with biological properties of the corresponding chromatin regions, we employ Gene Ontology (GO) enrichment analysis and perform multiple RNA coexpression studies.more » « less
-
Schlick, Tamar (Ed.)A model for DNA and nucleosomes is introduced with the goal of studying chromosomes from a single base level all the way to higher-order chromatin structures. This model, dubbed the Widely Editable Chromatin Model (WEChroM), reproduces the complex mechanics of the double helix including its bending persistence length and twisting persistence length, and the temperature dependence of the former. The WEChroM Hamiltonian is composed of chain connectivity, steric interactions, and associative memory terms representing all remaining interactions leading to the structure, dynamics, and mechanical characteristics of the B-DNA. Several applications of this model are discussed to demonstrate its applicability. WEChroM is used to investigate the behavior of circular DNA in the presence of positive and negative supercoiling. We show that it recapitulates the formation of plectonemes and of structural defects that relax mechanical stress. The model spontaneously manifests an asymmetric behavior with respect to positive or negative supercoiling, similar to what was previously observed in experiments. Additionally, we show that the associative memory Hamiltonian is also capable of reproducing the free energy of partial DNA unwrapping from nucleosomes. WEChroM is designed to emulate the continuously variable mechanical properties of the 10nm fiber and, by virtue of its simplicity, is ready to be scaled up to molecular systems large enough to investigate the structural ensembles of genes. WEChroM is implemented in the OpenMM simulation toolkits and is freely available for public use.more » « less
An official website of the United States government
